Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)

Job Opportunities at the Internet Archive

Visual Data Curation Fellow

Location: Inner Richmond, San Francisco, CA

Job Summary: In Collaboration with the Council on Library and Information Resources (CLIR), the Internet Archive seeks an innovative, technologically savvy Visual Data Curation Fellow to work with our diverse team of archivists, computer scientists, digitization specialists, scholars and the public to advance data curation of our visual collections. This fellowship requires both excellent technological skills and a deep understanding of the rich constellation of information that surrounds visual objects to enhance some of the world's largest visual collections of film, video, images from texts, web images, historic software, and music-related images. The fellow will add his/her area expertise to every aspect of our collections work: identification, preparation and digitization of different types of data, merging disparate metadata sources and developing schema for enhanced descriptions, data mining, data transfer between institutions, outreach to external visual collections, and supporting scholars and filmmakers who utilize our collections. We are building digital libraries of the future and developing processes for users to discover, access, analyze and add their own material to our rich visual collections is a core part of our mission.

For 18 years the Internet Archive has been building a vast online library of our shared human culture. Our library now stands at 19 petabytes of data-- which includes over 3 million digitized books (900 million text pages), 430 billion web pages, 650,000 hours of television news and 500,000 software applications. Each day two million visitors use this library, making www.archive.org one of the world's top 200 sites. We work with institutions around the globe, scanning books on five continents and archiving web pages for 350 institutions, including the Library of Congress, Stanford University, Smithsonian Institution and the U.S. Senate. Internet Archive is also home to the Prelinger Archives, a collection of more than 6500 historic films and visual ephemera - the first and largest free repository of archival moving image materials available for unrestricted viewing, downloading and reuse. Our collections support digital humanities initiatives such as Harvard University's Digital Archive of Japan's 2011 Disasters, curricula at University of Virginia, and are widely used by scholars, researchers and educators worldwide. Our archivists and technologists present the Archive's cutting edge work at conferences around the world, including recent talks at the 2014 Wolfram Data Summit, the European Research Council's Alexandria Project, and re:publica, Europe's leading conference on Internet and Society. Each year we host the annual Library Leaders Forum for the senior leaders of key libraries to come together to tackle large, highly collaborative opportunities.

The Visual Data Curation Fellow will bring his/her training and understanding of the field to help the Internet Archive leverage our work in these key ways: 1) helping to develop workflows for documenting, archiving, and refining existing datasets for our film collections, especially a new donation of more than 7,000 rare educational films; 2) building partnerships with libraries, special collections and scholars to enhance discovery, access and use of our visual holdings; 3) developing use cases for these visual primary sources and relationships with user communities 4) joining our team of computer scientists, user-interface designers and archivists who are building and testing the next generation of tools that will increase collection building by our users.

The Fellow will be based at the Internet Archive's main headquarters in San Francisco, where computer scientists, film and book scanners, archivists, and our founder and Digital Librarian, Brewster Kahle, come together in a unique and dynamic culture. Every Friday, guests from other libraries and universities, start-ups, Open Internet activists, and web pioneers gather with our entire staff for a communal lunch where everyone shares highlights of his/her work. Our open workspace is conducive to collaboration and information sharing, and weekly staff meetings include demonstrations of our newest digital technologies. The Fellow will be supervised by Alexis Rossi, Director of Web Services, who currently manages all aspects of Internet Archive collections work for movies, audio, software, television and books, and runs the Wayback Machine and user access projects. The CLIR Data Curation Fellow will be mentored by Rick Prelinger, archivist, filmmaker, Internet Archive Board Member and currently Associate Professor of Film and Digital Media at University of California at Santa Cruz.

Within Internet Archive's larger world, the CLIR Visual Curation Fellow will work directly with and alongside technology leaders including Brewster Kahle, as they develop big data initiatives, digital strategies, breakthrough technologies, user interface design and product development. Few libraries in the world are working with as much culturally rich data and the opportunities to experiment with big data sets are unparalleled. The CLIR Fellow will also have opportunities to participate in the Library Leaders Forum and other events that sit at the nexus of Silicon Valley technology and information sciences.

The successful candidate will have a PhD in Visual Studies, Film, Photography, or Library and Information Science--or a relevant discipline. The candidate must have a strong technical aptitude and be able to do basic coding and apply skills in data identification, data preparation, data ingest, and metadata generation.

Job Duties

  • Develop and implement workflows for film collections, including film identification, film preparation, data ingest, and metadata generation.
  • Be able to potentially work with a wide range of data formats - including all types of film, video, software, audio, web captures - to ensure adherence to leading practices of data curation.
  • Develop ongoing partnerships with libraries, special collections, scholars and researchers to enhance their understanding, discovery and use of the Internet Archive's collections
  • Promote the Internet Archive's Visual Collections through papers, blogs, presentations, and workshops
  • Work with multi-functional teams to design innovative ways to visualize and display our data
  • Participate in the strategic development, design and testing of a next-generation user interface and toolset for www.archive.org,
  • Conduct outreach to curators of special collections who might want to digitally preserve their visual collections within the Internet Archive

Required Knowledge and Skills

  • A PhD in a relevant field of Visual Studies, Film or Library and Information Sciences
  • Demonstrated ability to work collaboratively and successfully in a team-based environment
  • Ability to blend visual expertise with technical expertise to develop best practices for visual collections
  • Excellent verbal and written communication skills
  • Excellent work ethic and self-starter

Desired Knowledge and Skills

  • Strong facility with all forms of technology and software, especially databases
  • Demonstrated ability in software development with a proficiency in Python
  • Familiarity with metadata standards, including Dublin Core
  • Familiarity with diverse film formats, quality control of film transfers, and a basic understanding of digitization standards

Contract: Two years. Salary: $60,000 plus benefits. Also included: $6,000 travel stipend for travel expenses to nearby universities and other conferences. Position is from 1 July 2015 to 30 June 2017. We want to thank CLIR and the Andrew W. Mellon Foundation for a generous grant to support this position. For more information visit CLIR.

To Apply: Applications are being accepted here until December 29, 2014.

Internet Archive is an Equal Opportunity Employer. Internet Archive complies with the Fair Chance Ordinance.

About the Internet Archive: Internet Archive is a 501(c)(3) non-profit library founded in 1996. Our motto is Universal Access to All Knowledge. We collect web sites, books, audio, videos, software, and other types of media and make them available to the world for free.

Wayback Machine Senior Engineer

Location: Inner Richmond, San Francisco, CA

Job Classification: Full-time, exempt

Job Summary: The Internet Archive's Wayback Machine is the world's largest public archive of historical web sites. Have you ever wanted to work with 400 billion things at once? Would you like to serve 1,500 requests per second? How about having your service referred to regularly in news articles and blog posts across the web? You can work on a challenging and popular project and help the world at the same time.

We are looking for a smart, collaborative and resourceful engineer to help develop the next version of the Wayback Machine. We are also looking for someone who wants to dive in and help us figure out better processes, which may include code review, technical oversight and pair programming. The ideal candidate will possess a desire to work collaboratively with a small internal team and a large, vocal and active user community; demonstrating independence, creativity, initiative and technological savvy, in addition to being a great programmer/architect.

Minimum Qualifications:

  • 2-3 years work experience in Python, or similar
  • Experience working in linux environment
  • Familiarity with Java (current deployment is written in Java)
  • Good understanding of latest web framework technologies and aspects of web technology and protocols
  • Flexibility and a sense of humor
  • BS Computer Science, or equivalent work experience

Preferred Qualifications:

  • Experience with web crawlers and/or applications designed to display archived web content (especially server-side apps)
  • Cluster computing experience
  • Open source practices experience

To apply: Please send your resume and cover letter to jobs@archive.org with the subject line "Wayback Machine Senior Engineer."

Internet Archive is an Equal Opportunity Employer. Internet Archive complies with the Fair Chance Ordinance.

About the Internet Archive: Internet Archive is a 501(c)(3) non-profit library founded in 1996. Our motto is Universal Access to All Knowledge. We collect web sites, books, audio, videos, software, and other types of media and make them available to the world for free.

Software Project Engineer

Location: Inner Richmond, San Francisco, CA

Job Classification: Full-time, exempt

Job Summary/Goal: Our non-profit mission at the Internet Archive is to create a digital library of Internet sites and other cultural artifacts in digital form. This is to provide free access to researchers, historians, scholars and the general public.

Key Responsibilities: The duties of this position are to support a global Books digitization division that works with items born digital and items not born digital. The software and technical support of this position would include:

  • 4,000+ eBooks/week that are digitized, uploaded and processed from 30+ centers located on 4 continents. Keeping this pipeline working and efficient.
  • During any given month, there are 20 million items downloaded from our web sites that rank in the top 200 global web sites. How to learn from the user behavior and help keep this number growing.
  • There is over 1 Petabyte of data to manage. The questions we are asking are: is it in the right format, does it have the right features and contemporary tools for discovery and analysis and so on.
  • The type of digital data is changing and is not related to just books or printed material. As such the successful engineer will be part of an expanding universe of content that will be captured, posted and shared.
  • A diverse, dedicated staff that can range from 75-300 people depending on contracts, funding or commitments. This person will need to field answers to questions and solve problems that pop up as part of the operation.

Reporting Structure: The Software Project Engineer reports to the Global Director of Books The Software Project Engineer will work closely with the Process Manager who will help define, detail and concept the needs of the user community. The Software Project Engineer will work closely with the technical team that supports the server infrastructure; the audio/video/TV and other media content forms and the web team.


  • Programming - A wide variety of software languages are used inside the Internet Archive. PHP and JavaScript are the most common followed by Python. Other programming languages will be required as is necessary. Web app UI design experience is a plus. Experience working and creating API's is a plus
  • Problem solving - A creative, flexible approach that combines teamwork with individualism will be expected. Quick, functional solutions are the norm versus complex, elegant designs
  • Code - There is a large code base that needs to be maintained, improved where necessary and rewritten as appropriate. A bug tracking system is employed that will help the software engineer know where to focus his/her efforts.
  • Type of problems - These can range from crisis (a production center is down) to planned improvements that will improve quality or reduce cost. A Process Manager will assist the Software Project Engineer to strike the correct balance of support
  • This individual will need to provide clear and consistent communication to his management on issues that will impact cost, productivity, quality or impact Partner Library relations
  • At all times, be aware, that, the position will be part of the culture of the Books Division and a part of headquarters staff. Attitude, positive energy and creative force will be vital to not only his/her personal success, but also the contribution to the overall team and support of the non-profit mission.

Minimum Qualifications:

  • Proven, successful experience working in an Internet non-profit environment would be helpful, but not required
  • Comfortable in a project based, decentralized work environment.
  • Exposure or knowledge of library file structures (i.e. MARC, Dublin Core, Mets) or media formats such as audio, video or TV would be a plus.
  • Experience with programming embedded devices (i.e. Cameras)
  • Experience with GUI development for user interfaces.
  • Experience with large (billions) of files and a wide variety of file formats a plus.
  • Personality wise; good cheer, being a team player, conscientious and a hard-worker will help this person be more successful and accelerate their inclusion in the team (think of a start up environment, but not quite the same killer hours.)

To Apply: Please send your resume and cover letter to jobs@archive.org with the subject line " Software Project Engineer."

Internet Archive is an Equal Opportunity Employer. Internet Archive complies with the Fair Chance Ordinance.

About the Internet Archive: Internet Archive is a 501(c)(3) non-profit library founded in 1996. Our motto is Universal Access to All Knowledge. We collect web sites, books, audio, videos, software, and other types of media and make them available to the world for free.

Terms of Use (31 Dec 2014)